Parametrised Hausdorff Distance as a Non-Metric Similarity Model for Tandem Mass Spectrometry
نویسندگان
چکیده
Tandem mass spectrometry is a widely used method for protein and peptide sequences identification. Since the mass spectra contain up to 80% of noise and many other inaccuracies, there still exists a need for more accurate algorithms for mass spectra interpretation. The sizes of protein databases grow rapidly and the methods for indexing these databases in order to interpret mass spectra become very popular. The parametrised Hausdorff distance, suitable for non-metric search, is presented in this paper. It models the similarity among tandem mass spectra very well and it is able to match the spectrum to correct peptide sequence in many cases without any post-processing scoring system.
منابع مشابه
Sponsored by
Tandem mass spectrometry is a widely used method for protein and peptide sequences identification. Since the mass spectra contain up to 80% of noise and many other inaccuracies, there still exists a need for more accurate algorithms for mass spectra interpretation. The sizes of protein databases grow rapidly and the methods for indexing these databases in order to interpret mass spectra become ...
متن کاملNon-metric similarity search of tandem mass spectra including posttranslational modifications
In biological applications, the tandem mass spectrometry is a widely used method for determining protein and peptide sequences from an ”in vitro” sample. The sequences are not determined directly, but they must be interpreted from the mass spectra, which is the output of the mass spectrometer. This work is focused on a similarity-search approach to mass spectra interpretation, where the paramet...
متن کاملOn Comparison of SimTandem with State-of-the-Art Peptide Identification Tools, Efficiency of Precursor Mass Filter and Dealing with Variable Modifications
The similarity search in theoretical mass spectra generated from protein sequence databases is a widely accepted approach for identification of peptides from query mass spectra produced by shotgun proteomics. Growing protein sequence databases and noisy query spectra demand database indexing techniques and better similarity measures for the comparison of theoretical spectra against query spectr...
متن کاملRiemannian manifolds , spaces of measures and the Gromov - Hausdorff distance ∗
We equip the space M(X) of all Borel probability measures an a compact Riemannian manifold X with a canonical distance function which induces the weak-∗ topology on M(X) and has the following property: the map X 7→ M(X) is Lipschitz continous with respect to the Gromov-Hausdorff distance on the space of all the (isometry classes of) compact metric spaces. Introduction Last century brought sever...
متن کاملMining Mass Spectra: Metric Embeddings and Fast Near Neighbor Search
Mining large-scale high-throughput tandem mass spectrometry data sets is a very important problem in mass spectrometry based protein identification. One of the fundamental problems in large scale mining of spectra is to design appropriate metrics and algorithms to avoid all-pair-wise comparisons of spectra. In this paper, we present a general framework based on vector spaces to avoid pair-wise ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010